Sensitive Micro Data Protection Using Latin Hypercube Sampling Technique
نویسندگان
چکیده
We propose use of Latin Hypercube Sampling to create a synthetic data set that reproduces many of the essential features of an original data set while providing disclosure protection. The synthetic micro data can also be used to create either additive or multiplicative noise which when merged with the original data can provide disclosure protection. The technique can also be used to create hybrid micro data sets containing pre-determined mixtures of real and synthetic data. We demonstrate the basic properties of the synthetic data approach by applying the Latin Hypercube Sampling technique to a database supported a by the Energy Information Administration. The use of Latin Hypercube Sampling, along with the goal of reproducing the rank correlation structure instead of the Pearson correlation structure, has not been previously applied to the disclosure protection problem. Given its properties, this technique offers multiple alternatives to current methods for providing disclosure protection for large data sets.
منابع مشابه
Asymptotically Valid Confidence Intervals for Quantiles and Values-at-Risk When Applying Latin Hypercube Sampling
Quantiles, which are also known as values-at-risk in finance, are often used as risk measures. Latin hypercube sampling (LHS) is a variance-reduction technique (VRT) that induces correlation among the generated samples in such a way as to increase efficiency under certain conditions; it can be thought of as an extension of stratified sampling in multiple dimensions. This paper develops asymptot...
متن کاملCommodity price uncertainty propagation in open-pit mine production planning by Latin hypercube sampling method
Production planning of an open-pit mine is a procedure during which the rock blocks are assigned to different production periods in a way that leads to the highest net present value (NPV) subject to some operational and technical constraints. This process becomes much more complicated by incorporation of the uncertainty existing in the input parameters. The commodity price uncertainty is among ...
متن کاملUSING LATIN HYPERCUBE SAMPLING BASED ON THE ANN-HPSOGA MODEL FOR ESTIMATION OF THE CREATION PROBABILITY OF DAMAGED ZONE AROUND UNDERGROUND SPACES
The excavation damaged zone (EDZ) can be defined as a rock zone where the rock properties and conditions have been changed due to the processes related to an excavation. This zone affects the behavior of rock mass surrounding the construction that reduces the stability and safety factor and increase probability of failure of the structure. In this paper, a methodology was examined for computing...
متن کاملProgressive Latin Hypercube Sampling: An efficient approach for robust sampling-based analysis of environmental models
Efficient sampling strategies that scale with the size of the problem, computational budget, and users’ needs are essential for various sampling-based analyses, such as sensitivity and uncertainty analysis. In this study, we propose a new strategy, called Progressive Latin Hypercube Sampling (PLHS), which sequentially generates sample points while progressively preserving the distributional pro...
متن کاملA conditioned Latin hypercube method for sampling in the presence of ancillary information
This paper presents the conditioned Latin hypercube as a sampling strategy of an area with prior information represented as exhaustive ancillary data. Latin hypercube sampling (LHS) is a stratified random procedure that provides an efficient way of sampling variables from their multivariate distributions. It provides a full coverage of the range of each variable by maximally stratifying the mar...
متن کامل